K-Means Clustering

Data preprocessing

In [1]:

# Importing the dataset
dataset = read.csv('Mall_Customers.csv')
X = dataset[4:5]

Using the elbow method to find the optimal number of clusters

In [2]:

set.seed(12)
wcss = vector() #Within Cluster Sum of Square
for (i in 1:10) wcss[i] = sum(kmeans(X, i)$withinss)

plot(1:10,
     wcss,
     type = 'b',
     main = paste('Elbow Method'),
     xlab = 'Number of clusters',
     ylab = 'WCSS')

Out[2]:

From the Elbow method we can see that the optimal cluster number is 5 for the given dataset

Applying K-Means to the Mall dataset

In [3]:

set.seed(123)
kmeans = kmeans(x = X, centers = 5, iter.max = 300, nstart = 10)
y_kmeans = kmeans$cluster

Visualising the clusters

In [4]:

library(cluster)
clusplot(X,
         y_kmeans,
         lines = 0,
         shade = TRUE,
         color = TRUE,
         labels = 5,
         plotchar = FALSE,
         span = TRUE,
         main = paste('Clusters of customers'),
         xlab = 'Annual Income',
         ylab = 'Spending Score')
# for more see help(clusplot.default)

Out[4]:

The target customers should be the one with High Earning and High Spend. Here the datapoints inside cluster 1 are the customers with High Earning and High Spend

K-Means Clustering

Data preprocessing

Using the elbow method to find the optimal number of clusters

Applying K-Means to the Mall dataset

Visualising the clusters

Product

Resources

Company